Take-home Exercise 1: Application of Spatial Point Patterns Analysis to discover the geographical distribution of Grab hailing services in Singapore

Published

March 26, 2024

1 Introduction

1.1 Setting the Scene

Human mobility, the movement of human beings in space and time, reflects the spatial-temporal characteristics of human behavior. With the advancement Information and Communication Technologies (ICT) especially smart phone, a large volume of data related to human mobility have been collected. By using appropriate GIS analysis methods, these data are potentially useful in supporting smart city planning and management.

In Singapore, one of the important source of data related to human mobility is from Land Transport Authority (LTA) DataMall. Two data sets related to human mobility are provided by the portal, they are: Passenger Volume by Origin Destination Train Stations and Passenger Volume by Origin Destination Bus Stops. One of the limitation of these data sets is that their location are biased to either bus stops or MRT/LRT stations. In 2020, another very interesting human mobility data set called Grab Posisi was released by GRAB, one of the largest shared taxi operator in South-east Asia. There are two data sets been released and one of them is for Singapore.

1.2 Objectives

Geospatial analytics hold tremendous potential to address complex problems facing society. In this study, you are tasked to apply appropriate spatial point patterns analysis methods to discover the geographical and spatio-temporal distribution of Grab hailing services locations in Singapore.

1.3 Tasks

The specific tasks of this take-home exercise are as follows:

  • Using appropriate function of sf and tidyverse, preparing the following geospatial data layer in sf tibble data.frames:

    • Grab taxi location points either by origins or destinations.

    • Road layer within Singapore excluding outer islands.

  • Singapore boundary layer excluding outer islands

  • Using the extracted data, derive traditional Kernel Density Estimation layers.

  • Using the extracted data, derive either Network Kernel Density Estimation (NKDE) or Temporal Network Kernel Density Estimation (TNKDE)

  • Using appropriate tmap functions, display the kernel density layers on openstreetmap of Singapore.

  • Describe the spatial patterns revealed by the kernel density maps.

1.4 Data

Aspatial Data

  • Grab-Posisi of Singapore

Geospatial Data

2 Preparing the data

2.1 Loading the packages

These are the following packages that are being loaded in:

  • sf: Desgined to import, manage and process vector-based geospatial data in R.

  • spatstat: Wide range of useful functions for point pattern analysis.

  • raster: reads, writes, manipulates, analyses and model of gridded spatial data (i.e. raster)

  • maptools: Provides a set of tools for manipulating geographic data.

  • tmap: Provides functions for plotting cartographic quality static point patterns maps or interactive maps by using leaflet API.

  • lubridate: Makes it easier to do the things R does with date-times.

  • arrow: exposes an interface to the Arrow C++ library, enabling access to many of its features in R.s

pacman::p_load(arrow, lubridate, maptools, sp, sf, raster, spatstat, tmap, classInt, viridis, tidyverse, spNetwork)

2.2 Importing the data

Preparing the geospatial data layer in sf tibble data.frames. The tibble package provides opinionated data frames that will make working in the tidyverse easier.

2.2.1 Importing Grab data

The grab data set is extremely big. Therefore, the following code chunks are used to import in all the grab data before appending them together.

grab_00_df <- read_parquet("data/aspatial/GrabPosisi/part-00000-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")

grab_01_df <- read_parquet("data/aspatial/GrabPosisi/part-00001-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")

grab_02_df <- read_parquet("data/aspatial/GrabPosisi/part-00002-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")

grab_03_df <- read_parquet("data/aspatial/GrabPosisi/part-00003-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")

grab_04_df <- read_parquet("data/aspatial/GrabPosisi/part-00004-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")

grab_05_df <- read_parquet("data/aspatial/GrabPosisi/part-00005-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")

grab_06_df <- read_parquet("data/aspatial/GrabPosisi/part-00006-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")

grab_07_df <- read_parquet("data/aspatial/GrabPosisi/part-00007-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")

grab_08_df <- read_parquet("data/aspatial/GrabPosisi/part-00008-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")

grab_09_df <- read_parquet("data/aspatial/GrabPosisi/part-00009-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")
grab_df <- rbind(grab_00_df, grab_01_df, grab_02_df, grab_03_df, grab_04_df, grab_05_df, grab_06_df, grab_07_df, grab_08_df, grab_09_df)

2.2.2 Importing Master Plan 2019 Subzone Boundary (No Sea) data

Next, the Master Plan 2019 Subzone Boundary (No Sea) geospatial data set will be imported using st_read() of the sf package.

mpsz_sf <- st_read(dsn = "data/geospatial/MPSZ", layer = "MPSZ-2019")
Reading layer `MPSZ-2019' from data source 
  `/Users/shanellelam/Library/CloudStorage/OneDrive-SingaporeManagementUniversity/School/Year 3/chanelelele/IS415-GAA/Take-home_Ex/Take-home_Ex01/data/geospatial/MPSZ' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84

Features: 332 Fields: 6 Geometry Type: Multipolygon

2.2.3 Removing the Outer Islands from Master Plan 2019 Subzone Boundary (No Sea) data

To examine the Master Plan 2019 Subzone Boundary (No Sea) geospatial data set:

plot(mpsz_sf)

As observed from the map, the main island has multiple surrounding islands. Examine the data tablle in order to identify the label names of the surrounding islands.

The following code chunk is to remove the surrounding islands:

new_mpsz_sf <- mpsz_sf[!(mpsz_sf$PLN_AREA_N %in% c("WESTERN ISLANDS", "SOUTHERN ISLANDS", "NORTH-EASTERN ISLANDS")), ]

2.2.4 Importing the road data set from OpenStreetMap (Malaysia, Singapore, Brunei)

Next, the road data set from OpenStreetMap (Malaysia, Singapore, Brunei) will be imported using st_read() of the sf package.

osm_roads <- st_read(dsn = "data/geospatial/gis_osm_roads_free_1.shp")
Reading layer `gis_osm_roads_free_1' from data source 
  `/Users/shanellelam/Library/CloudStorage/OneDrive-SingaporeManagementUniversity/School/Year 3/chanelelele/IS415-GAA/Take-home_Ex/Take-home_Ex01/data/geospatial/gis_osm_roads_free_1.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 1767403 features and 10 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 99.66041 ymin: 0.8021131 xmax: 119.2601 ymax: 7.514393
Geodetic CRS:  WGS 84

2.3 Preparing the geospatial data layer

2.3.1 Checking the Content of the MP19 Simple Feature Data Frame

2.3.1.1 Displaying basic information of the MP19 feature class

st_geometry() of the sf package is used to extract and return the geometry component of the object.

st_geometry(new_mpsz_sf)
Geometry set for 326 features 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.21024 xmax: 104.0844 ymax: 1.470775
Geodetic CRS:  WGS 84
First 5 geometries:

2.3.1.2 To learn more about the associated attribute information in the dataframe

glimpse() of the dplyr package is used to l display a concise summary of the structure of the object.

glimpse(new_mpsz_sf)
Rows: 326
Columns: 7
$ SUBZONE_N  <chr> "MARINA EAST", "INSTITUTION HILL", "ROBERTSON QUAY", "FORT …
$ SUBZONE_C  <chr> "MESZ01", "RVSZ05", "SRSZ01", "MUSZ02", "MPSZ05", "BMSZ17",…
$ PLN_AREA_N <chr> "MARINA EAST", "RIVER VALLEY", "SINGAPORE RIVER", "MUSEUM",…
$ PLN_AREA_C <chr> "ME", "RV", "SR", "MU", "MP", "BM", "DT", "SV", "BM", "BM",…
$ REGION_N   <chr> "CENTRAL REGION", "CENTRAL REGION", "CENTRAL REGION", "CENT…
$ REGION_C   <chr> "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR",…
$ geometry   <MULTIPOLYGON [°]> MULTIPOLYGON (((103.8802 1...., MULTIPOLYGON (…

2.3.1.3 To reveal complete information of a feature object

To view the first 5 rows of a feature object:

head(new_mpsz_sf, n=5)  
Simple feature collection with 5 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.8303 ymin: 1.280459 xmax: 103.8988 ymax: 1.297462
Geodetic CRS:  WGS 84
         SUBZONE_N SUBZONE_C      PLN_AREA_N PLN_AREA_C       REGION_N REGION_C
1      MARINA EAST    MESZ01     MARINA EAST         ME CENTRAL REGION       CR
2 INSTITUTION HILL    RVSZ05    RIVER VALLEY         RV CENTRAL REGION       CR
3   ROBERTSON QUAY    SRSZ01 SINGAPORE RIVER         SR CENTRAL REGION       CR
5     FORT CANNING    MUSZ02          MUSEUM         MU CENTRAL REGION       CR
6 MARINA EAST (MP)    MPSZ05   MARINE PARADE         MP CENTRAL REGION       CR
                        geometry
1 MULTIPOLYGON (((103.8802 1....
2 MULTIPOLYGON (((103.8376 1....
3 MULTIPOLYGON (((103.8341 1....
5 MULTIPOLYGON (((103.8472 1....
6 MULTIPOLYGON (((103.8987 1....

2.3.1.4 Plotting the Geospatial data

plot(new_mpsz_sf)

plot(st_geometry(new_mpsz_sf))

plot(new_mpsz_sf["PLN_AREA_N"])

2.3.1.5 Assigning the EPSG Projection to the MP19 Data Frame

st_crs(new_mpsz_sf)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]

The current CRS being used is WGS84, which is for lat/long specifications, while EPSG is a database for CRS and related information.

The following code chunk changes the CRS to EPSG 3414:

mpsz3414 <- st_transform(new_mpsz_sf, 3414)
st_crs(mpsz3414)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

2.3.2 Preparing the osm_roads Layer

2.3.2.1 Basic information of the OSM Roads layer

st_geometry(osm_roads)
Geometry set for 1767403 features 
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 99.66041 ymin: 0.8021131 xmax: 119.2601 ymax: 7.514393
Geodetic CRS:  WGS 84
First 5 geometries:

2.3.2.2 Assigning the EPSG Projection to the OSM Roads layer

st_crs(osm_roads)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]

Similarly, the output showed that the CRS being used is WGS84. Therefore, it needs to be changed to 3414 too.

roads3414 <- st_transform(osm_roads, 3414)
st_crs(roads3414)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(mpsz3414)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

2.3.2.3 Retrieving the roads layer within Singapore

In the following code chunk, st_intersection() of the sf package and is used to extract parts of the the road geometries that intersect with the given region.

sg_roads <- st_intersection(roads3414, mpsz3414)
plot(sg_roads)

2.4 Preparing the Grab Data

glimpse(grab_df)
Rows: 30,329,685
Columns: 9
$ trj_id        <chr> "70014", "73573", "75567", "1410", "4354", "32630", "646…
$ driving_mode  <chr> "car", "car", "car", "car", "car", "car", "car", "car", …
$ osname        <chr> "android", "android", "android", "android", "android", "…
$ pingtimestamp <int> 1554943236, 1555582623, 1555141026, 1555731693, 15555844…
$ rawlat        <dbl> 1.342326, 1.321781, 1.327088, 1.262482, 1.283799, 1.3003…
$ rawlng        <dbl> 103.8890, 103.8564, 103.8613, 103.8238, 103.8072, 103.90…
$ speed         <dbl> 18.910000, 17.719076, 14.021548, 13.026521, 14.812943, 2…
$ bearing       <int> 248, 44, 34, 181, 93, 73, 82, 321, 324, 31, 203, 50, 252…
$ accuracy      <dbl> 3.900, 4.000, 3.900, 4.000, 3.900, 3.900, 3.000, 3.649, …

As observed, the pingtimestamp is in the wrong data type format. It should be in date/time format and not integer.

2.4.1 Adding Timestamp to the Grab datapoints

Code chunk to convert the data type of pingtimestamp from character to date-time:

grab_df$pingtimestamp <- as_datetime(grab_df$pingtimestamp)

The following code chunk is used to save your object as an RDS file in R for storage:

write_rds(grab_df, "data/aspatial/rds/part0.rds")

2.4.2 Preparing the Origin Grab taxi location points

2.4.2.1 Extracting the Origin locations

The following code chunk extracts the trips’ origin locations. It derives three new columns (i.e. variables) for weekday, starting hour and day of the month. It then names the output tibble dataframe origin_df.

origin_df <- grab_df %>% 
  group_by(trj_id) %>% 
  arrange(pingtimestamp) %>% 
  filter(row_number()==1) %>%
  mutate(weekday = wday(pingtimestamp,
                        label=TRUE,
                        abbr=TRUE),
         start_hr = factor(hour(pingtimestamp)),
         day = factor(mday(pingtimestamp)))

Saving the df for future use:

write_rds(origin_df, "data/aspatial/rds/origin_df.rds")

Importing the df:

origin_df <- read_rds("data/aspatial/rds/origin_df.rds")

2.4.2.2 Converting aspatial data into geospatial data

Converting the origin dfs into a sf tibble data frame by using it’s location information.

origin_sf <- st_as_sf(origin_df,
                      coords = c("rawlng", "rawlat"),
                      crs = 4326) %>%
  st_transform(crs = 3414)

2.4.2.3 Visualising the data

ggplot functions are used to reveal the distribution of origin trips by day of the week.

Visualising frequency distribution:

ggplot(data=origin_df, 
       aes(x=weekday)) + 
  geom_bar()

tmap functions are used to plot a point symbol map by using the origin trips locations.

Visualising as Point Symbol Map:

tmap_mode("plot")
tm_shape(origin_sf) +
  tm_dots()

3 Deriving the traditional Kernel Density Estimation layer

3.1 Geospatial Data Wrangling

3.1.1 Converting sf data frames into generic spatstat’s ppp format

The following code chunks are the 3 steps requried to convert the sf data frames into a generic spatstat’s ppp format.

The code chunk below uses as_Spatial() of sf package to convert the three geospatial data from simple feature data frame to sp’s Spatial* class.

origin <- as_Spatial(origin_sf)
mpsz <- as_Spatial(mpsz_sf)
mpsz2 <- as_Spatial(mpsz3414)

spatstat requires the analytical data in ppp object form. However, as there is no direct way to convert a Spatial* classes into ppp object, the Spatial classes* need to be converted into Spatial objects first.

origin_sp <- as(origin, "SpatialPoints")

Lastly, the as.ppp() function of spatstat is used to convert the spatial data into spatstat’s ppp object format.

origin_ppp <- as.ppp(origin_sf)
origin_ppp
Marked planar point pattern: 28000 points
marks are of storage type  'character'
window: rectangle = [3628.24, 49845.23] x [25198.14, 49689.64] units

The following code chunks are used to examine the difference.

plot(origin_ppp)

To examine the summary statistics of the newly created ppp object:

summary(origin_ppp)
Marked planar point pattern:  28000 points
Average intensity 2.473666e-05 points per square unit

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

marks are of type 'character'
Summary:
   Length     Class      Mode 
    28000 character character 

Window: rectangle = [3628.24, 49845.23] x [25198.14, 49689.64] units
                    (46220 x 24490 units)
Window area = 1131920000 square units

3.1.2 Handling duplicated points

To check the duplication in a ppp object:

any(duplicated(origin_ppp))
[1] FALSE

As observed, there are no duplicates in the ppp object.

However, the jittering approach can be used to remove duplicates if necessary. It adds a small perturbation to the duplicate points so that they do not occupy the exact same space.

origin_ppp_jit <- rjitter(origin_ppp, 
                             retry=TRUE, 
                             nsim=1, 
                             drop=TRUE)

3.1.3 Creating Coastal Outline

sg_sf <- mpsz3414 %>%
  st_union()
plot(sg_sf)

3.1.4 Creating owin object

In the spatstat package, owin objects are used to rrepresent polygonal regions such as the Singapore boundary geographical area.

The code chunk below is used to covert sg SpatialPolygon object into owin object of spatstat:

sg_owin <- as.owin(sg_sf)

3.1.5 Combining point events object and owin object

Extracting grab points that are located within Singapore by using the code chunk below.

originSG_ppp = origin_ppp[sg_owin]

The output object would then combine both the point and polygon feature in one ppp object class as shown below.

summary(originSG_ppp)
Marked planar point pattern:  27872 points
Average intensity 4.193821e-05 points per square unit

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

marks are of type 'character'
Summary:
   Length     Class      Mode 
    27872 character character 

Window: polygonal boundary
37 separate polygons (29 holes)
                  vertices         area relative.area
polygon 1            12666  6.63014e+08      9.98e-01
polygon 2              285  1.61128e+06      2.42e-03
polygon 3               27  1.50315e+04      2.26e-05
polygon 4 (hole)        41 -4.01660e+04     -6.04e-05
polygon 5 (hole)       317 -5.11280e+04     -7.69e-05
polygon 6 (hole)         3 -4.14100e-04     -6.23e-13
polygon 7               30  2.80002e+04      4.21e-05
polygon 8 (hole)         4 -2.86396e-01     -4.31e-10
polygon 9 (hole)         3 -1.81439e-04     -2.73e-13
polygon 10 (hole)        3 -8.68789e-04     -1.31e-12
polygon 11 (hole)        3 -5.99535e-04     -9.02e-13
polygon 12 (hole)        3 -3.04560e-04     -4.58e-13
polygon 13 (hole)        3 -4.46076e-04     -6.71e-13
polygon 14 (hole)        3 -3.39794e-04     -5.11e-13
polygon 15 (hole)        3 -4.52043e-05     -6.80e-14
polygon 16 (hole)        3 -3.90173e-05     -5.87e-14
polygon 17 (hole)        3 -9.59850e-05     -1.44e-13
polygon 18 (hole)        4 -2.54488e-04     -3.83e-13
polygon 19 (hole)        4 -4.28453e-01     -6.45e-10
polygon 20 (hole)        4 -2.18616e-04     -3.29e-13
polygon 21 (hole)        5 -2.44411e-04     -3.68e-13
polygon 22 (hole)        5 -3.64686e-02     -5.49e-11
polygon 23              71  8.18750e+03      1.23e-05
polygon 24 (hole)        6 -8.37554e-01     -1.26e-09
polygon 25 (hole)       38 -7.79904e+03     -1.17e-05
polygon 26 (hole)        3 -3.41897e-05     -5.14e-14
polygon 27 (hole)        3 -3.65499e-03     -5.50e-12
polygon 28 (hole)        3 -4.95057e-02     -7.45e-11
polygon 29              91  1.49663e+04      2.25e-05
polygon 30 (hole)        5 -2.92235e-04     -4.40e-13
polygon 31 (hole)        3 -7.43616e-06     -1.12e-14
polygon 32 (hole)      270 -1.21455e+03     -1.83e-06
polygon 33 (hole)       19 -4.39650e+00     -6.62e-09
polygon 34 (hole)       35 -1.38385e+02     -2.08e-07
polygon 35 (hole)       23 -1.99656e+01     -3.00e-08
polygon 36              71  5.63061e+03      8.47e-06
polygon 37              10  1.99717e+02      3.01e-07
enclosing rectangle: [2667.54, 55941.94] x [21448.47, 50256.33] units
                     (53270 x 28810 units)
Window area = 664597000 square units
Fraction of frame area: 0.433
plot(originSG_ppp)

3.2 First-order Spatial Point Patterns Analysis

3.2.1 Kernel Density Estimation

Deriving kernel density estimation (KDE) layer for visualising and exploring the intensity of point processes using spatstat package.

3.2.1.1 Computing kernel density estimation using automatic bandwidth selection method

The code chunk below computes a kernel density by using the following configurations of density() of spatstat.

In addition, bw.diggle() is an automatic bandwidth selection method. It uses cross-validation to select a smoothing bandwidth for the kernel estimation of point process intensity.

kde_originSG_bw <- density(originSG_ppp,
                              sigma=bw.diggle,
                              edge=TRUE,
                            kernel="gaussian") 

To display the kernel density derived:

plot(kde_originSG_bw)

The density values of the output range from 0 to 0.0015 which is way too small to comprehend. This is because the default unit of measurement of svy21 is in meter. As a result, the density values computed is in “number of points per square meter”.

To retrieve the bandwidth:

bw <- bw.diggle(originSG_ppp)
bw
   sigma 
8.087057 

3.2.1.2 Rescalling KDE value

In the code chunk below, rescale() is used to covert the unit of measurement from meter to kilometer:

originSG_ppp.km <- rescale(originSG_ppp, 1000, "km")

Re-running density() using the resale data set and plot the output kde map:

kde_originSG.bw <- density(originSG_ppp.km, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
plot(kde_originSG.bw)

As observed, the output image looks identical to the earlier version, the only changes in the data values.

3.2.2 Working with different automatic bandwidth methods

bw.ppl() algorithm is being used to determine the bandwidth as it tends to produce the more appropriate values when the pattern consists predominantly of tight clusters.

bw.ppl(originSG_ppp.km)
    sigma 
0.1238747 

bw.diggle() method is to detect a single tight cluster in the midst of random noise.

bw.diggle(originSG_ppp.km)
      sigma 
0.008087057 

The code chunk beow will be used to compare the output of using bw.diggle and bw.ppl methods:

kde_originSG.ppl <- density(originSG_ppp.km, 
                               sigma=bw.ppl, 
                               edge=TRUE,
                               kernel="gaussian")
par(mfrow=c(2,2), mar = c(4, 4, 2, 1))
plot(kde_originSG.bw, main = "bw.diggle")
plot(kde_originSG.ppl, main = "bw.ppl")

3.2.3 Working with different kernel methods

The code chunk below will be used to compute the three more kernel density estimations by using these three kernel functions: Epanechnikov, Quartic and Dics.

par(mfrow=c(2,2), mar = c(4, 4, 2, 1))
plot(density(originSG_ppp.km, 
             sigma=bw.ppl, 
             edge=TRUE, 
             kernel="gaussian"), 
     main="Gaussian")
plot(density(originSG_ppp.km, 
             sigma=bw.ppl, 
             edge=TRUE, 
             kernel="epanechnikov"), 
     main="Epanechnikov")
plot(density(originSG_ppp.km, 
             sigma=bw.ppl, 
             edge=TRUE, 
             kernel="quartic"), 
     main="Quartic")
plot(density(originSG_ppp.km, 
             sigma=bw.ppl, 
             edge=TRUE, 
             kernel="disc"), 
     main="Disc")

Originally, it kept returning that the figure margins were too large. Therefore, I added the mar() to increase the plotting device size to allow more space for the plots. (can be adjusted according to preference)

3.2.4 Fixed and Adaptive KDE

3.2.4.1 Computing KDE by using fixed bandwidth

kde_originSG_600 <- density(originSG_ppp.km, sigma=0.6, edge=TRUE, kernel="gaussian")
plot(kde_originSG_600)

3.2.4.2 Computing KDE by using adaptive bandwidth

However, since fixed bandwidth method is very sensitive to highly skew distribution of spatial point patterns over geographical units, one way to overcome this problem is by using adaptive bandwidth instead.

Deriving adaptive density estimation by using density.adaptive() of spatstat:

kde_originSG_adaptive <- adaptive.density(originSG_ppp.km, method="kernel")
plot(kde_originSG_adaptive)

par(mfrow=c(1,2))
plot(kde_originSG.bw, main = "Fixed bandwidth")
plot(kde_originSG_adaptive, main = "Adaptive bandwidth")

3.2.5 Converting KDE output into grid object.

Converting it so that it is suitable for mapping purposes.

gridded_kde_originSG_bw <- as.SpatialGridDataFrame.im(kde_originSG.bw)
spplot(gridded_kde_originSG_bw)

3.2.5.1 Converting gridded output into raster

Converting the gridded kernel density objects into RasterLayer object by using raster() package.

kde_originSG_bw_raster <- raster(gridded_kde_originSG_bw)
kde_originSG_bw_raster
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 0.4162063, 0.2250614  (x, y)
extent     : 2.667538, 55.94194, 21.44847, 50.25633  (xmin, xmax, ymin, ymax)
crs        : NA 
source     : memory
names      : v 
values     : -3.745309e-13, 2295.244  (min, max)

As observed, the CRS is currently NA. Therefore, the code chunk below will be used to include the CRS information to EPSG 3414.

Assigning projection systems:

projection(kde_originSG_bw_raster) <- CRS("+init=EPSG:3414")
kde_originSG_bw_raster
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 0.4162063, 0.2250614  (x, y)
extent     : 2.667538, 55.94194, 21.44847, 50.25633  (xmin, xmax, ymin, ymax)
crs        : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m +no_defs 
source     : memory
names      : v 
values     : -3.745309e-13, 2295.244  (min, max)

3.2.5.2 Visualising the output in tmap

Displaying the raster in cartographic quality map using tmap package:

tm_shape(kde_originSG_bw_raster) + 
  tm_raster("v") +
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE)

3.2.6 Comparing Spatial Point Patterns using KDE

Comparing the KDE of grab origins in Downtown Core, Serangoon, Orchard and Choa Chu Kang.

3.2.6.1 Extracting study area

The code chunk below will be used to extract the target planning areas:

dc = mpsz2[mpsz2@data$PLN_AREA_N == "DOWNTOWN CORE",]
se = mpsz2[mpsz2@data$PLN_AREA_N == "SERANGOON",]
or = mpsz2[mpsz2@data$PLN_AREA_N == "ORCHARD",]
ck = mpsz2[mpsz2@data$PLN_AREA_N == "CHOA CHU KANG",]

Plotting target planning areas:

par(mfrow=c(2,2), mar = c(4, 4, 2, 1))
plot(dc, main = "Downtown Core")
plot(se, main = "Serangoon")
plot(or, main = "Orchard")
plot(ck, main = "Choa Chu Kang")

3.2.6.2 Converting the spatial point data frame into generic sp format

Convert these SpatialPolygonsDataFrame layers into generic spatialpolygons layers:

dc_sp = as(dc, "SpatialPolygons")
se_sp = as(se, "SpatialPolygons")
or_sp = as(or, "SpatialPolygons")
ck_sp = as(ck, "SpatialPolygons")

3.2.6.3 Creating owin object

Converting these SpatialPolygons objects into owin objects that is required by spatstat:

dc_owin = as(dc_sp, "owin")
se_owin = as(se_sp, "owin")
or_owin = as(or_sp, "owin")
ck_owin = as(ck_sp, "owin")

3.2.6.4 Combining origin points and the study area

To extract the origin points that are within the specific region:

origin_dc_ppp = origin_ppp_jit[dc_owin]
origin_se_ppp = origin_ppp_jit[se_owin]
origin_or_ppp = origin_ppp_jit[or_owin]
origin_ck_ppp = origin_ppp_jit[ck_owin]

rescale() function is used to transform the unit of measurement from metre to kilometre:

origin_dc_ppp.km = rescale(origin_dc_ppp, 1000, "km")
origin_se_ppp.km = rescale(origin_se_ppp, 1000, "km")
origin_or_ppp.km = rescale(origin_or_ppp, 1000, "km")
origin_ck_ppp.km = rescale(origin_ck_ppp, 1000, "km")

To plot these four study areas and the locations of the origin points:

par(mfrow=c(2,2), mar = c(4, 4, 2, 1))
plot(origin_dc_ppp.km, main="Downtown Core")
plot(origin_se_ppp.km, main="Serangoon")
plot(origin_or_ppp.km, main="Orchard")
plot(origin_ck_ppp.km, main="Choa Chu Kang")

3.2.6.5 Computing KDE

The code chunk below will be used to compute the KDE of these four planning areas. bw.diggle method is used to derive the bandwidth of each:

par(mfrow=c(2,2), mar = c(4, 4, 2, 1))
plot(density(origin_dc_ppp.km, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Downtown Core")
plot(density(origin_se_ppp.km, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Serangoon")
plot(density(origin_or_ppp.km, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Orchard")
plot(density(origin_ck_ppp.km, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Choa Chu Kang")

3.2.6.6 Computing fixed bandwidth KDE

par(mfrow=c(2,2), mar = c(4, 4, 2, 1))
plot(density(origin_dc_ppp.km, 
             sigma=0.25, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Downtown Core")
plot(density(origin_se_ppp.km, 
             sigma=0.25, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Serangoon")
plot(density(origin_or_ppp.km, 
             sigma=0.25, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Orchard")
plot(density(origin_ck_ppp.km, 
             sigma=0.25, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Choa Chu Kang")

As observed, the fixed bandwidth used a constant smoothing parameter for the dataset while the regular KDE adjusts the bandwidth dynamically based on the local density of the data.

From the fixed bandwdith KDE calculation, the contrast between the values are more significant. The graph suggests that Orchard area has the highest probability of origin points.

3.2.7 Nearest Neighbour Analysis

3.2.7.1 Testing spatial point patterns using Clark and Evans Test

Performing the Clark-Evans test of aggregation for a spatial point pattern by using clarkevans.test() of statspat:

The test hypotheses are:

Ho = The distribution of grab origin points are randomly distributed.

H1= The distribution of grab origin points are not randomly distributed.

The 95% confident interval will be used.

clarkevans.test(originSG_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("clustered"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Z-test

data:  originSG_ppp
R = 0.27981, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)

Analysis of results:

The test results suggest that there is evidence to reject the null hypothesis.

The alternative hypothesis indicates clustering (R < 1), which implies that the observed spatial pattern of points in the originSG_ppp data is significantly clustered.

The low p-value (< 2.2e-16) indicates a high level of statistical significance, reinforcing the conclusion that the clustering is not likely due to random chance.

3.2.7.2 Clark and Evans Test: Downtown Core planning area

In the code chunk below, clarkevans.test() of spatstat is used to performs Clark-Evans test of aggregation for childcare centre in the Downtown Core planning area.

clarkevans.test(origin_dc_ppp,
                correction="none",
                clipregion=NULL,
                alternative=c("two.sided"),
                nsim=999)

    Clark-Evans test
    No edge correction
    Z-test

data:  origin_dc_ppp
R = 0.5306, p-value < 2.2e-16
alternative hypothesis: two-sided

Analysis of results:

The test results suggest to reject the null hypothesis. The alternative hypothesis implies that the spatial pattern of points in the origin_dc_ppp data is significantly different from a random distribution.

The low p-value (< 2.2e-16) indicates a high level of statistical significance, supporting the conclusion that the spatial pattern is not likely due to random chance.

3.3 Network Kernel Density Estimation (NKDE)

3.3.1 Extracting the data

dc_sf <- st_as_sf(dc_sp)
dc_roads <- st_intersection(dc_sf, roads3414)

3.3.2 Visualising the Geospatial data

plot(dc_roads)

Code chunk below can be used to print the content of Downtown Core roads SpatialLineDataFrame and Downtown Core SpatialPointsDataFrame by using the code chunk below.

str(dc_sf)
Classes 'sf' and 'data.frame':  13 obs. of  1 variable:
 $ geometry:sfc_POLYGON of length 13; first list element: List of 1
  ..$ : num [1:85, 1:2] 29201 29194 29194 29193 29191 ...
  ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: 
  ..- attr(*, "names")= chr(0) 
str(dc_roads)
Classes 'sf' and 'data.frame':  3525 obs. of  11 variables:
 $ osm_id  : chr  "16688065" "21560685" "21560688" "21567384" ...
 $ code    : int  5112 5114 5114 5113 5115 5115 5115 5115 5113 5113 ...
 $ fclass  : chr  "trunk" "secondary" "secondary" "primary" ...
 $ name    : chr  "Nicoll Highway" "Beach Road" "Beach Road" "Hill Street" ...
 $ ref     : chr  NA NA NA NA ...
 $ oneway  : chr  "F" "F" "F" "F" ...
 $ maxspeed: int  70 50 50 60 50 50 50 50 50 50 ...
 $ layer   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ bridge  : chr  "F" "F" "F" "F" ...
 $ tunnel  : chr  "F" "F" "F" "F" ...
 $ geometry:sfc_GEOMETRY of length 3525; first list element:  'XY' num [1:5, 1:2] 30398 30402 30405 30422 30446 ...
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA
  ..- attr(*, "names")= chr [1:10] "osm_id" "code" "fclass" "name" ...

To visualise the geospatial data with high cartographic quality and interactive manner, the mapping function of tmap package can be used as shown in the code chunk below:

tmap_mode('view')
tm_shape(dc_sf) + 
  tm_dots() + 
  tm_shape(dc_roads) +
  tm_lines()
tmap_mode('plot')

3.3.3 Network Constrained KDE (NetKDE) Analysis

Before creating the lixels objects, it is essential to note that the geometry type needs to be transformed to LINESTRING:

dc_roads <- st_cast(dc_roads, "LINESTRING")
dc_roads
Simple feature collection with 3525 features and 10 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 28896.52 ymin: 28002.47 xmax: 31924.98 ymax: 31852.13
Projected CRS: SVY21 / Singapore TM
First 10 features:
     osm_id code    fclass             name  ref oneway maxspeed layer bridge
1  16688065 5112     trunk   Nicoll Highway <NA>      F       70     0      F
2  21560685 5114 secondary       Beach Road <NA>      F       50     0      F
3  21560688 5114 secondary       Beach Road <NA>      F       50     0      F
4  21567384 5113   primary      Hill Street <NA>      F       60     0      F
5  21567394 5115  tertiary      High Street <NA>      F       50     0      F
6  21567404 5115  tertiary  North Boat Quay <NA>      F       50     0      F
7  21573270 5115  tertiary Parliament Place <NA>      F       50     0      F
8  21573273 5115  tertiary   Coleman Street <NA>      F       50     0      F
9  21581796 5113   primary    Stamford Road <NA>      F       50     0      F
10 21581796 5113   primary    Stamford Road <NA>      F       50     0      F
   tunnel                       geometry
1       F LINESTRING (30397.99 30427....
2       F LINESTRING (30377.85 30711....
3       F LINESTRING (31566.64 31767....
4       F LINESTRING (29927.63 30760....
5       F LINESTRING (29730.71 30378....
6       F LINESTRING (29798.53 30202....
7       F LINESTRING (29903.95 30252....
8       F LINESTRING (30043.1 30410.8...
9       F LINESTRING (30413.81 30413....
10      F LINESTRING (30404.77 30421....

3.3.3.1 Preparing the lixels objects

Before computing NetKDE, the SpatialLines object need to be cut into lixels with a specified minimal distance. This task can be performed by using with lixelize_lines() of spNetwork as shown in the code chunk below.

lixels <- lixelize_lines(dc_roads, 
                         700, 
                         mindist = 350)

3.3.3.2 Generating line centre points

Next, lines_center() of spNetwork is used to generate a SpatialPointsDataFrame (i.e. samples) with line centre points as shown in the code chunk below.

samples <- lines_center(lixels)

The points are located at center of the line based on the length of the line.

3.3.3.3 Performing NetKDE

Before computing NetKDE, it is essential to note that the geometry type needs to be transformed to points:

st_geometry_type(dc_sf)
 [1] POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON POLYGON
[10] POLYGON POLYGON POLYGON POLYGON
18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE

Convert polygons to points:

dc_sf_points <- st_centroid(dc_sf)

Check the geometry type after conversion:

st_geometry_type(dc_sf_points)
 [1] POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT
[13] POINT
18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE

Computing NetKDE:

densities <- nkde(dc_roads, 
                  events = dc_sf_points,
                  w = rep(1,nrow(dc_sf_points)),
                  samples = samples,
                  kernel_name = "quartic",
                  bw = 300, 
                  div= "bw", 
                  method = "simple", 
                  digits = 1, 
                  tol = 1,
                  grid_shape = c(1,1), 
                  max_depth = 8,
                  agg = 5, #we aggregate events within a 5m radius (faster calculation)
                  sparse = TRUE,
                  verbose = FALSE)

3.3.3.4 Visualising NetKDE

Before visualising the NetKDE values, the code chunk below will be used to insert the computed density values (i.e. densities) into samples and lixels objects as density field.

samples$density <- densities
lixels$density <- densities

Since svy21 projection system is in meter, the computed density values are very small i.e. 0.0000005. The code chunk below is used to resale the density values from number of events per meter to number of events per kilometer.

# rescaling to help the mapping
samples$density <- samples$density*1000
lixels$density <- lixels$density*1000

The code below uses appropriate functions of tmap package to prepare interactive and high cartographic quality map visualisation.

tmap_mode('view')
tm_shape(lixels)+
  tm_lines(col="density")+
tm_shape(dc_sf_points)+
  tm_dots()
tmap_mode('plot')

The interactive map above effectively reveals road segments (darker color) with relatively higher density of origin points than road segments with relatively lower density of origin points (lighter color).

3.3.4 Network Constrained G- and K-Function Analysis

Performing complete spatial randomness (CSR) test by using kfunctions() of spNetwork package.

The null hypothesis is defined as: The observed spatial point events (i.e distribution of origin points) are uniformly distributed over a street network in Downtown Core Planning Area.

The CSR test is based on the assumption of the binomial point process which implies the hypothesis that the origin points are randomly and independently distributed over the street network.

If this hypothesis is rejected, we may infer that the distribution of origin points are spatially interacting and dependent on each other; as a result, they may form non-random patterns.

kfun_dcorigins <- kfunctions(dc_roads, 
                             dc_sf_points,
                             start = 0, 
                             end = 1000, 
                             step = 50, 
                             width = 50, 
                             nsim = 50, 
                             resolution = 50,
                             verbose = FALSE, 
                             conf_int = 0.05)

Visualising the ggplot2 object of k-function by using the code chunk below:

kfun_dcorigins$plotk

The blue line is the empirical network K-function of the origin points in Downtown Core planning area. The gray envelop represents the results of the 50 simulations in the interval 2.5% - 97.5%. Because the blue line does not fall below the gray area, we can infer that the origin points in Downtown Core planning area does not resemble normal distribution.

4 Conclusion

From the KDE Plots, we can see clearly that the central areas that are for work have a higher probability of having grab origin points in comparison to residential areas.

Utilising spatial point patterns analysis we can analyse that the distribution of origin points are spatially interacting and dependent on each other; as a result, they may form non-random patterns.